DACSS 601: Data Science Fundamentals - FALL 2022
  • Fall 2022 Posts
  • Contributors
  • DACSS

HW 2

  • Course information
    • Overview
    • Instructional Team
    • Course Schedule
  • Weekly materials
    • Fall 2022 posts
    • final posts

On this page

  • What are we looking at today?
  • Tidy the Data
    • Rename Variables
    • Remove Unneeded Variables
    • Recode Continent
    • Is Voting compulsory in your country? Does you country enforce compulsory?
    • Recode Voter ID Type laws
    • Types of voter registration
  • Explain the Data
  • Questions to Ask Ourselves
  • Visualize the Data

HW 2

  • Show All Code
  • Hide All Code

  • View Source
hw2
kristin_abijaoude
voteridlaws
More data wrangling: pivoting
Author

Kristin Abijaoude

Published

October 9, 2022

Code
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
Code
library(tidyverse)
── Attaching packages
───────────────────────────────────────
tidyverse 1.3.2 ──
✔ ggplot2 3.4.0     ✔ purrr   0.3.5
✔ tibble  3.1.8     ✔ stringr 1.5.0
✔ tidyr   1.2.1     ✔ forcats 0.5.2
✔ readr   2.1.3     
Warning: package 'ggplot2' was built under R version 4.2.2
Warning: package 'stringr' was built under R version 4.2.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
Code
library(ggplot2)
Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

What are we looking at today?

Today, we are looking at Voter ID laws by country, as well as US States. I downloaded the dataset from Mr. Tom Barton PhD of the Department of Politics, International Relations, and Philosophy in the University of London, Royal Holloway, courtesy of archival website called Data is Plural.

To open the CSV dataset, we will use read_csv() command, as demonstrated below in the chunk.

Code
Voter_ID <- read_csv("_data/dataverse_files/cvil_22_09_08.csv")
Voter_ID
# A tibble: 249 × 16
   cntry     cntry…¹ id_type id_ty…² num_id exhaust min_id law_yr cmp_vt cmp_enf
   <chr>     <chr>     <dbl> <chr>    <dbl>   <dbl>  <dbl>  <dbl>  <dbl>   <dbl>
 1 Afghanis… AFG           3 Photo …    1         1      1   2016      0       0
 2 Albania   ALB           3 Photo …    2         1      1   2012      0       0
 3 Algeria   DZA           3 Photo …    1.5       0      1   2012      0       0
 4 Andorra   AND           3 Photo …    1.5       0      1   2014      0       0
 5 Angola    AGO           2 Non Ph…    1         1      1   2004      0       0
 6 Anguilla  AIA           1 Basic …    0         1      0   2019      0       0
 7 Antigua … ATG           3 Photo …    1         1      1   2001      0       0
 8 Argentina ARG           3 Photo …    5         1      1   2012      1       1
 9 Armenia   ARM           3 Photo …    3         1      1   2016      0       0
10 Aruba     ABW           3 Photo …    4         1      2   1987      0       0
# … with 239 more rows, 6 more variables: us_dum <dbl>, reg_dev <dbl>,
#   reg_law <dbl>, reg_lab <chr>, continent <dbl>, cmp_id <chr>, and
#   abbreviated variable names ¹​cntry_cd, ²​id_type_lab

Voter ID laws differ wherever you go; we analyize voting laws from around the globe, from our commonwealth to Afghanistan to Norway to Zimbabwe. We get a glimpse of the dimensions of the dataset, as well as the name of the columns, in which we will fix. We have 249 rows, with each row representing a country or a US state, and 16 columns representing each voter law, from compulsory ID laws to voter ID requirements.

Code
# Components of the Dataset

nrow(Voter_ID)
[1] 249
Code
ncol(Voter_ID)
[1] 16
Code
dim(Voter_ID)
[1] 249  16
Code
colnames(Voter_ID)
 [1] "cntry"       "cntry_cd"    "id_type"     "id_type_lab" "num_id"     
 [6] "exhaust"     "min_id"      "law_yr"      "cmp_vt"      "cmp_enf"    
[11] "us_dum"      "reg_dev"     "reg_law"     "reg_lab"     "continent"  
[16] "cmp_id"     

Tidy the Data

To begin the tidying process, we will rename the columns. When I downloaded the dataset as a ZIP file, it came with a PDF with a list of the columns and their respective labels. I will use that PDF as my guide as I tidy the data.

Code
summary(Voter_ID)
    cntry             cntry_cd            id_type      id_type_lab       
 Length:249         Length:249         Min.   :1.000   Length:249        
 Class :character   Class :character   1st Qu.:2.000   Class :character  
 Mode  :character   Mode  :character   Median :3.000   Mode  :character  
                                       Mean   :2.581                     
                                       3rd Qu.:3.000                     
                                       Max.   :3.000                     
                                       NA's   :3                         
     num_id          exhaust           min_id           law_yr    
 Min.   : 0.000   Min.   :0.0000   Min.   :0.0000   Min.   :1918  
 1st Qu.: 1.000   1st Qu.:0.0000   1st Qu.:1.0000   1st Qu.:1998  
 Median : 1.500   Median :1.0000   Median :1.0000   Median :2006  
 Mean   : 2.935   Mean   :0.7353   Mean   :0.9076   Mean   :2004  
 3rd Qu.: 3.375   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:2013  
 Max.   :54.000   Max.   :1.0000   Max.   :2.0000   Max.   :2021  
 NA's   :11       NA's   :11       NA's   :11       NA's   :24    
     cmp_vt          cmp_enf            us_dum          reg_dev      
 Min.   :0.0000   Min.   :0.00000   Min.   :0.0000   Min.   : 0.000  
 1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.0000   1st Qu.: 0.000  
 Median :0.0000   Median :0.00000   Median :0.0000   Median : 0.000  
 Mean   :0.1084   Mean   :0.05622   Mean   :0.2008   Mean   : 5.723  
 3rd Qu.:0.0000   3rd Qu.:0.00000   3rd Qu.:0.0000   3rd Qu.: 0.000  
 Max.   :1.0000   Max.   :1.00000   Max.   :1.0000   Max.   :53.000  
                                                                     
    reg_law        reg_lab            continent       cmp_id         
 Min.   :1.000   Length:249         Min.   :1.00   Length:249        
 1st Qu.:1.000   Class :character   1st Qu.:2.00   Class :character  
 Median :3.000   Mode  :character   Median :2.00   Mode  :character  
 Mean   :2.167                      Mean   :2.53                     
 3rd Qu.:3.000                      3rd Qu.:4.00                     
 Max.   :3.000                      Max.   :5.00                     
 NA's   :4                                                           

Rename Variables

Code
Voter_ID <- Voter_ID %>%
  rename("Country Name" = cntry,
         "Country ID" = cntry_cd,
         "Voter ID Law Type" = id_type,
         "Number of Different IDs Allowed to Prove Identity" = num_id,
         "Does the electoral law provide an exhaustive list of different IDs voters can present?" = exhaust,
         "Minimum Number of IDs Required by Law" = min_id,
         "Year of Current Law Enforced" = law_yr,
         "Does the country have compulsory voting?" = cmp_vt,
         "Does the country enforce compulsory voting?" = cmp_enf,
         "Does the country have compulsory national ID cards?" = cmp_id,
         "Registration Type Law" = reg_law,
         "Continent" = continent)
Voter_ID
# A tibble: 249 × 16
   Country Nam…¹ Count…² Voter…³ id_ty…⁴ Numbe…⁵ Does …⁶ Minim…⁷ Year …⁸ Does …⁹
   <chr>         <chr>     <dbl> <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1 Afghanistan   AFG           3 Photo …     1         1       1    2016       0
 2 Albania       ALB           3 Photo …     2         1       1    2012       0
 3 Algeria       DZA           3 Photo …     1.5       0       1    2012       0
 4 Andorra       AND           3 Photo …     1.5       0       1    2014       0
 5 Angola        AGO           2 Non Ph…     1         1       1    2004       0
 6 Anguilla      AIA           1 Basic …     0         1       0    2019       0
 7 Antigua and … ATG           3 Photo …     1         1       1    2001       0
 8 Argentina     ARG           3 Photo …     5         1       1    2012       1
 9 Armenia       ARM           3 Photo …     3         1       1    2016       0
10 Aruba         ABW           3 Photo …     4         1       2    1987       0
# … with 239 more rows, 7 more variables:
#   `Does the country enforce compulsory voting?` <dbl>, us_dum <dbl>,
#   reg_dev <dbl>, `Registration Type Law` <dbl>, reg_lab <chr>,
#   Continent <dbl>,
#   `Does the country have compulsory national ID cards?` <chr>, and
#   abbreviated variable names ¹​`Country Name`, ²​`Country ID`,
#   ³​`Voter ID Law Type`, ⁴​id_type_lab, …

Now, notice that I did not rename every variable because some are mere repeats of other variables. Therefore, I will remove them because they’re unnecessary to keep. The command I will use is ‘select()’, with another command ‘-c()’ inside the ‘select()’ command, and type in the variables that I will remove.

Remove Unneeded Variables

Code
Voter_ID <- Voter_ID %>%
  select(-c(id_type_lab, us_dum, reg_dev, reg_lab))
Voter_ID
# A tibble: 249 × 12
   Country Nam…¹ Count…² Voter…³ Numbe…⁴ Does …⁵ Minim…⁶ Year …⁷ Does …⁸ Does …⁹
   <chr>         <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1 Afghanistan   AFG           3     1         1       1    2016       0       0
 2 Albania       ALB           3     2         1       1    2012       0       0
 3 Algeria       DZA           3     1.5       0       1    2012       0       0
 4 Andorra       AND           3     1.5       0       1    2014       0       0
 5 Angola        AGO           2     1         1       1    2004       0       0
 6 Anguilla      AIA           1     0         1       0    2019       0       0
 7 Antigua and … ATG           3     1         1       1    2001       0       0
 8 Argentina     ARG           3     5         1       1    2012       1       1
 9 Armenia       ARM           3     3         1       1    2016       0       0
10 Aruba         ABW           3     4         1       2    1987       0       0
# … with 239 more rows, 3 more variables: `Registration Type Law` <dbl>,
#   Continent <dbl>,
#   `Does the country have compulsory national ID cards?` <chr>, and
#   abbreviated variable names ¹​`Country Name`, ²​`Country ID`,
#   ³​`Voter ID Law Type`, ⁴​`Number of Different IDs Allowed to Prove Identity`,
#   ⁵​`Does the electoral law provide an exhaustive list of different IDs voters can present?`,
#   ⁶​`Minimum Number of IDs Required by Law`, …

Next, we will recode the values in the dataset so we won’t be confused by what those numbers mean. We will go through certain variables that need recoding using the mutate () and recode () commands. Let’s start off with continents.

Recode Continent

Code
Voter_ID <- Voter_ID %>%
  mutate(Continent=recode(Continent, 
                            `1` = "Africa",
                            `2` = "Americas",
                            `3` = "Asia",
                            `4` = "Europe",
                            `5` = "Oceania"))
Voter_ID
# A tibble: 249 × 12
   Country Nam…¹ Count…² Voter…³ Numbe…⁴ Does …⁵ Minim…⁶ Year …⁷ Does …⁸ Does …⁹
   <chr>         <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1 Afghanistan   AFG           3     1         1       1    2016       0       0
 2 Albania       ALB           3     2         1       1    2012       0       0
 3 Algeria       DZA           3     1.5       0       1    2012       0       0
 4 Andorra       AND           3     1.5       0       1    2014       0       0
 5 Angola        AGO           2     1         1       1    2004       0       0
 6 Anguilla      AIA           1     0         1       0    2019       0       0
 7 Antigua and … ATG           3     1         1       1    2001       0       0
 8 Argentina     ARG           3     5         1       1    2012       1       1
 9 Armenia       ARM           3     3         1       1    2016       0       0
10 Aruba         ABW           3     4         1       2    1987       0       0
# … with 239 more rows, 3 more variables: `Registration Type Law` <dbl>,
#   Continent <chr>,
#   `Does the country have compulsory national ID cards?` <chr>, and
#   abbreviated variable names ¹​`Country Name`, ²​`Country ID`,
#   ³​`Voter ID Law Type`, ⁴​`Number of Different IDs Allowed to Prove Identity`,
#   ⁵​`Does the electoral law provide an exhaustive list of different IDs voters can present?`,
#   ⁶​`Minimum Number of IDs Required by Law`, …

Next, we will recode the values in compulsory voting laws.

Is Voting compulsory in your country? Does you country enforce compulsory?

Code
Voter_ID <- Voter_ID %>%
  mutate(Voter_ID, `Does the country have compulsory voting?` = recode(`Does the country have compulsory voting?`, `0` = "No", `1` = "Yes"))

Voter_ID <- Voter_ID %>%
  mutate(Voter_ID, `Does the country enforce compulsory voting?` = recode(`Does the country enforce compulsory voting?`, `0` = "No", `1`="Yes"))
  
Voter_ID
# A tibble: 249 × 12
   Country Nam…¹ Count…² Voter…³ Numbe…⁴ Does …⁵ Minim…⁶ Year …⁷ Does …⁸ Does …⁹
   <chr>         <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <chr>   <chr>  
 1 Afghanistan   AFG           3     1         1       1    2016 No      No     
 2 Albania       ALB           3     2         1       1    2012 No      No     
 3 Algeria       DZA           3     1.5       0       1    2012 No      No     
 4 Andorra       AND           3     1.5       0       1    2014 No      No     
 5 Angola        AGO           2     1         1       1    2004 No      No     
 6 Anguilla      AIA           1     0         1       0    2019 No      No     
 7 Antigua and … ATG           3     1         1       1    2001 No      No     
 8 Argentina     ARG           3     5         1       1    2012 Yes     Yes    
 9 Armenia       ARM           3     3         1       1    2016 No      No     
10 Aruba         ABW           3     4         1       2    1987 No      No     
# … with 239 more rows, 3 more variables: `Registration Type Law` <dbl>,
#   Continent <chr>,
#   `Does the country have compulsory national ID cards?` <chr>, and
#   abbreviated variable names ¹​`Country Name`, ²​`Country ID`,
#   ³​`Voter ID Law Type`, ⁴​`Number of Different IDs Allowed to Prove Identity`,
#   ⁵​`Does the electoral law provide an exhaustive list of different IDs voters can present?`,
#   ⁶​`Minimum Number of IDs Required by Law`, …

Here, we figure out what type of voting laws there are to prove one’s identity. In this dataset, there are three categories: basic personal information, non-photo ID required or requested, and photo ID required.

Recode Voter ID Type laws

Code
Voter_ID <- Voter_ID %>%
  mutate(Voter_ID, `Voter ID Law Type` 
         = recode(`Voter ID Law Type`,
                  `1` = "Basic Personal Details",
                  `2` = "Non-Photo ID Required or Requested",
                  `3` = "Photo ID Required"))

Voter_ID
# A tibble: 249 × 12
   Country Nam…¹ Count…² Voter…³ Numbe…⁴ Does …⁵ Minim…⁶ Year …⁷ Does …⁸ Does …⁹
   <chr>         <chr>   <chr>     <dbl>   <dbl>   <dbl>   <dbl> <chr>   <chr>  
 1 Afghanistan   AFG     Photo …     1         1       1    2016 No      No     
 2 Albania       ALB     Photo …     2         1       1    2012 No      No     
 3 Algeria       DZA     Photo …     1.5       0       1    2012 No      No     
 4 Andorra       AND     Photo …     1.5       0       1    2014 No      No     
 5 Angola        AGO     Non-Ph…     1         1       1    2004 No      No     
 6 Anguilla      AIA     Basic …     0         1       0    2019 No      No     
 7 Antigua and … ATG     Photo …     1         1       1    2001 No      No     
 8 Argentina     ARG     Photo …     5         1       1    2012 Yes     Yes    
 9 Armenia       ARM     Photo …     3         1       1    2016 No      No     
10 Aruba         ABW     Photo …     4         1       2    1987 No      No     
# … with 239 more rows, 3 more variables: `Registration Type Law` <dbl>,
#   Continent <chr>,
#   `Does the country have compulsory national ID cards?` <chr>, and
#   abbreviated variable names ¹​`Country Name`, ²​`Country ID`,
#   ³​`Voter ID Law Type`, ⁴​`Number of Different IDs Allowed to Prove Identity`,
#   ⁵​`Does the electoral law provide an exhaustive list of different IDs voters can present?`,
#   ⁶​`Minimum Number of IDs Required by Law`, …

Another question regarding voting ID is whether a country provides an exhaustive, or a really comprehensive, list of types of ID to prove one’s identity. In other words, can you use just about anything to prove your identity in order to vote?

Code
Voter_ID <- Voter_ID %>%
  mutate(Voter_ID, `Does the electoral law provide an exhaustive list of different IDs voters can present?`
         = recode(`Does the electoral law provide an exhaustive list of different IDs voters can present?`,
                  `0` = "No",
                  `1` = "Yes"))

Voter_ID
# A tibble: 249 × 12
   Country Nam…¹ Count…² Voter…³ Numbe…⁴ Does …⁵ Minim…⁶ Year …⁷ Does …⁸ Does …⁹
   <chr>         <chr>   <chr>     <dbl> <chr>     <dbl>   <dbl> <chr>   <chr>  
 1 Afghanistan   AFG     Photo …     1   Yes           1    2016 No      No     
 2 Albania       ALB     Photo …     2   Yes           1    2012 No      No     
 3 Algeria       DZA     Photo …     1.5 No            1    2012 No      No     
 4 Andorra       AND     Photo …     1.5 No            1    2014 No      No     
 5 Angola        AGO     Non-Ph…     1   Yes           1    2004 No      No     
 6 Anguilla      AIA     Basic …     0   Yes           0    2019 No      No     
 7 Antigua and … ATG     Photo …     1   Yes           1    2001 No      No     
 8 Argentina     ARG     Photo …     5   Yes           1    2012 Yes     Yes    
 9 Armenia       ARM     Photo …     3   Yes           1    2016 No      No     
10 Aruba         ABW     Photo …     4   Yes           2    1987 No      No     
# … with 239 more rows, 3 more variables: `Registration Type Law` <dbl>,
#   Continent <chr>,
#   `Does the country have compulsory national ID cards?` <chr>, and
#   abbreviated variable names ¹​`Country Name`, ²​`Country ID`,
#   ³​`Voter ID Law Type`, ⁴​`Number of Different IDs Allowed to Prove Identity`,
#   ⁵​`Does the electoral law provide an exhaustive list of different IDs voters can present?`,
#   ⁶​`Minimum Number of IDs Required by Law`, …

Next, does the country have compulsory national ID laws? In countries where you must show some form of ID, some provide national ID cards that can be used when showing up at the booth.

Code
Voter_ID <- Voter_ID %>%
  mutate(Voter_ID, `Does the country have compulsory national ID cards?`
         = recode(`Does the country have compulsory national ID cards?`,
                  `0` = "No",
                  `1` = "Yes",
                  "#N/A" = "No Data"))

Voter_ID
# A tibble: 249 × 12
   Country Nam…¹ Count…² Voter…³ Numbe…⁴ Does …⁵ Minim…⁶ Year …⁷ Does …⁸ Does …⁹
   <chr>         <chr>   <chr>     <dbl> <chr>     <dbl>   <dbl> <chr>   <chr>  
 1 Afghanistan   AFG     Photo …     1   Yes           1    2016 No      No     
 2 Albania       ALB     Photo …     2   Yes           1    2012 No      No     
 3 Algeria       DZA     Photo …     1.5 No            1    2012 No      No     
 4 Andorra       AND     Photo …     1.5 No            1    2014 No      No     
 5 Angola        AGO     Non-Ph…     1   Yes           1    2004 No      No     
 6 Anguilla      AIA     Basic …     0   Yes           0    2019 No      No     
 7 Antigua and … ATG     Photo …     1   Yes           1    2001 No      No     
 8 Argentina     ARG     Photo …     5   Yes           1    2012 Yes     Yes    
 9 Armenia       ARM     Photo …     3   Yes           1    2016 No      No     
10 Aruba         ABW     Photo …     4   Yes           2    1987 No      No     
# … with 239 more rows, 3 more variables: `Registration Type Law` <dbl>,
#   Continent <chr>,
#   `Does the country have compulsory national ID cards?` <chr>, and
#   abbreviated variable names ¹​`Country Name`, ²​`Country ID`,
#   ³​`Voter ID Law Type`, ⁴​`Number of Different IDs Allowed to Prove Identity`,
#   ⁵​`Does the electoral law provide an exhaustive list of different IDs voters can present?`,
#   ⁶​`Minimum Number of IDs Required by Law`, …

Lastly, let’s recode voter registration, which is in three categories:

  1. Laissez-Faire: registering to vote is optional
  2. Assisted: you can register to vote through utilizing public services
  3. Automatic: you are automatically registered to vote once you’ve reached voting age

Types of voter registration

Code
Voter_ID <- Voter_ID %>%
  mutate(Voter_ID, `Registration Type Law`
         = recode(`Registration Type Law`,
                  `1` = "Laissez-Faire Registration",
                  `2` = "Assisted Registration",
                  `3` = "Automatic Registration"))

# Sanity Check
Voter_ID
# A tibble: 249 × 12
   Country Nam…¹ Count…² Voter…³ Numbe…⁴ Does …⁵ Minim…⁶ Year …⁷ Does …⁸ Does …⁹
   <chr>         <chr>   <chr>     <dbl> <chr>     <dbl>   <dbl> <chr>   <chr>  
 1 Afghanistan   AFG     Photo …     1   Yes           1    2016 No      No     
 2 Albania       ALB     Photo …     2   Yes           1    2012 No      No     
 3 Algeria       DZA     Photo …     1.5 No            1    2012 No      No     
 4 Andorra       AND     Photo …     1.5 No            1    2014 No      No     
 5 Angola        AGO     Non-Ph…     1   Yes           1    2004 No      No     
 6 Anguilla      AIA     Basic …     0   Yes           0    2019 No      No     
 7 Antigua and … ATG     Photo …     1   Yes           1    2001 No      No     
 8 Argentina     ARG     Photo …     5   Yes           1    2012 Yes     Yes    
 9 Armenia       ARM     Photo …     3   Yes           1    2016 No      No     
10 Aruba         ABW     Photo …     4   Yes           2    1987 No      No     
# … with 239 more rows, 3 more variables: `Registration Type Law` <chr>,
#   Continent <chr>,
#   `Does the country have compulsory national ID cards?` <chr>, and
#   abbreviated variable names ¹​`Country Name`, ²​`Country ID`,
#   ³​`Voter ID Law Type`, ⁴​`Number of Different IDs Allowed to Prove Identity`,
#   ⁵​`Does the electoral law provide an exhaustive list of different IDs voters can present?`,
#   ⁶​`Minimum Number of IDs Required by Law`, …

I will replace NAs with the label “no data”

Code
no_data <- c("No Data")
Voter_ID$`Voter ID Law Type`[is.na(Voter_ID$`Voter ID Law Type`)] <- no_data

Voter_ID$`Number of Different IDs Allowed to Prove Identity`[is.na(Voter_ID$`Number of Different IDs Allowed to Prove Identity`)] <-no_data

Voter_ID$`Does the electoral law provide an exhaustive list of different IDs voters can present?`[is.na(Voter_ID$`Does the electoral law provide an exhaustive list of different IDs voters can present?`)] <- no_data

Voter_ID$`Minimum Number of IDs Required by Law`[is.na(Voter_ID$`Minimum Number of IDs Required by Law`)] <- no_data

Voter_ID$`Year of Current Law Enforced`[is.na(Voter_ID$`Year of Current Law Enforced`)] <- no_data

Voter_ID$`Registration Type Law`[is.na(Voter_ID$`Registration Type Law`)] <-no_data

Voter_ID
# A tibble: 249 × 12
   Country Nam…¹ Count…² Voter…³ Numbe…⁴ Does …⁵ Minim…⁶ Year …⁷ Does …⁸ Does …⁹
   <chr>         <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  
 1 Afghanistan   AFG     Photo … 1       Yes     1       2016    No      No     
 2 Albania       ALB     Photo … 2       Yes     1       2012    No      No     
 3 Algeria       DZA     Photo … 1.5     No      1       2012    No      No     
 4 Andorra       AND     Photo … 1.5     No      1       2014    No      No     
 5 Angola        AGO     Non-Ph… 1       Yes     1       2004    No      No     
 6 Anguilla      AIA     Basic … 0       Yes     0       2019    No      No     
 7 Antigua and … ATG     Photo … 1       Yes     1       2001    No      No     
 8 Argentina     ARG     Photo … 5       Yes     1       2012    Yes     Yes    
 9 Armenia       ARM     Photo … 3       Yes     1       2016    No      No     
10 Aruba         ABW     Photo … 4       Yes     2       1987    No      No     
# … with 239 more rows, 3 more variables: `Registration Type Law` <chr>,
#   Continent <chr>,
#   `Does the country have compulsory national ID cards?` <chr>, and
#   abbreviated variable names ¹​`Country Name`, ²​`Country ID`,
#   ³​`Voter ID Law Type`, ⁴​`Number of Different IDs Allowed to Prove Identity`,
#   ⁵​`Does the electoral law provide an exhaustive list of different IDs voters can present?`,
#   ⁶​`Minimum Number of IDs Required by Law`, …
Code
Voter_ID <- relocate(Voter_ID, "Continent", .after = "Country ID")
  
Voter_ID
# A tibble: 249 × 12
   Country Nam…¹ Count…² Conti…³ Voter…⁴ Numbe…⁵ Does …⁶ Minim…⁷ Year …⁸ Does …⁹
   <chr>         <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  
 1 Afghanistan   AFG     Asia    Photo … 1       Yes     1       2016    No     
 2 Albania       ALB     Europe  Photo … 2       Yes     1       2012    No     
 3 Algeria       DZA     Africa  Photo … 1.5     No      1       2012    No     
 4 Andorra       AND     Europe  Photo … 1.5     No      1       2014    No     
 5 Angola        AGO     Africa  Non-Ph… 1       Yes     1       2004    No     
 6 Anguilla      AIA     Americ… Basic … 0       Yes     0       2019    No     
 7 Antigua and … ATG     Americ… Photo … 1       Yes     1       2001    No     
 8 Argentina     ARG     Americ… Photo … 5       Yes     1       2012    Yes    
 9 Armenia       ARM     Asia    Photo … 3       Yes     1       2016    No     
10 Aruba         ABW     Americ… Photo … 4       Yes     2       1987    No     
# … with 239 more rows, 3 more variables:
#   `Does the country enforce compulsory voting?` <chr>,
#   `Registration Type Law` <chr>,
#   `Does the country have compulsory national ID cards?` <chr>, and
#   abbreviated variable names ¹​`Country Name`, ²​`Country ID`, ³​Continent,
#   ⁴​`Voter ID Law Type`, ⁵​`Number of Different IDs Allowed to Prove Identity`,
#   ⁶​`Does the electoral law provide an exhaustive list of different IDs voters can present?`, …

Finally, we have finished tidying up the data!

Explain the Data

Mr. Tom Barton PhD, a Politics postgraduate student at the University of London, Royal Holloway, compiled voter ID laws by country, as well as US states. Voter ID laws, in this sense, refer to laws regarding requirements for voters to prove their identity in order to vote. Some countries and states don’t require any form of ID asides from basic personal information, such as Massachusetts, Dominica, and New Zealand, while others are relatively restrictive with required photo IDs, such as Wisconsin, Texas, and Yemen. Most of the columns presented are thankfully self-explanatory.

In territories where IDs are required to vote, this dataset lists out those that offer more ways to show ID, hence the column “Number of Different IDs Allowed to Prove One’s Identity”. According to Barton, that was an open-ended question, as you can see that some answers are not rounded to the full amount. Also, in the same column, the territories that don’t require IDs to vote are assigned to the value 0; in other words, a voter doesn’t need to bring an ID to the polls.

Lastly, there are three categories of voter registrations. Laissez-Faire Registration states that registering to vote is optional; Assisted Registration states that one can register to vote through various means, such as utilizing public services (i.e. Massachusetts); and Automatic Registration states that when the voter reaches voting age (mostly 18), they are automatically registered to vote.

Questions to Ask Ourselves

What do you think is the best way to prevent voter fraud?

Do you think more voter ID laws would protect democracy and the integrity of elections?

Do you think more voter ID laws would stifle civic participation and harm democracy?

Do you think fewer voter ID laws would increase voter turnout?

Do you think fewer voter ID laws would increase the chances of voter fraud?

What do you think is the best way to increase voter turnout?

Visualize the Data

Code
Voter_ID %>%
  count(`Registration Type Law`)
# A tibble: 4 × 2
  `Registration Type Law`        n
  <chr>                      <int>
1 Assisted Registration         30
2 Automatic Registration       128
3 Laissez-Faire Registration    87
4 No Data                        4
Code
Voter_Reg <- data.frame(
  Voter.Registration.Law =factor(c("Lassiez Faire Registration","Automatic Registration","Assisted Registration")),  
  Countries.And.Territories=c(87,123,30)
  )
Voter_Reg
      Voter.Registration.Law Countries.And.Territories
1 Lassiez Faire Registration                        87
2     Automatic Registration                       123
3      Assisted Registration                        30
Code
ggplot(data=Voter_Reg, aes(x= Voter.Registration.Law, y=Countries.And.Territories)) +
    geom_bar(stat="identity")

Code
Voter_ID %>%
  count(`Voter ID Law Type`)
# A tibble: 4 × 2
  `Voter ID Law Type`                    n
  <chr>                              <int>
1 Basic Personal Details                36
2 No Data                                3
3 Non-Photo ID Required or Requested    31
4 Photo ID Required                    179
Code
Voter_ID_Type <- data.frame(
  Voter.ID.Type =factor(c("Basic Personal Details","Non-Photo ID Required or Requested","Photo ID Required")),  
  Countries.And.Territories=c(36,31,179)
  )
Voter_ID_Type
                       Voter.ID.Type Countries.And.Territories
1             Basic Personal Details                        36
2 Non-Photo ID Required or Requested                        31
3                  Photo ID Required                       179
Code
ggplot(data=Voter_ID_Type, aes(x= Voter.ID.Type, y=Countries.And.Territories)) +
    geom_bar(stat="identity")

Source Code
---
title: "HW 2 "
author: "Kristin Abijaoude"
description: "More data wrangling: pivoting"
date: "10/09/2022"
format:
  html:
    toc: true
    code-fold: true
    code-copy: true
    code-tools: true
categories:
  - hw2
  - kristin_abijaoude
  - voteridlaws
---
```{r}
library(dplyr)
library(tidyverse)
library(ggplot2)
```

```{r}
#| label: setup
#| warning: false
#| message: false

library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```

# What are we looking at today?

Today, we are looking at Voter ID laws by country, as well as US States. I downloaded the dataset from Mr. Tom Barton PhD of the Department of Politics, International Relations, and Philosophy in the University of London, Royal Holloway, courtesy of archival website called Data is Plural. 

To open the CSV dataset, we will use read_csv() command, as demonstrated below in the chunk.

```{r}
Voter_ID <- read_csv("_data/dataverse_files/cvil_22_09_08.csv")
Voter_ID
```

Voter ID laws differ wherever you go; we analyize voting laws from around the globe, from our commonwealth to Afghanistan to Norway to Zimbabwe. We get a glimpse of the dimensions of the dataset, as well as the name of the columns, in which we will fix. We have 249 rows, with each row representing a country or a US state, and 16 columns representing each voter law, from compulsory ID laws to voter ID requirements.

```{r}
# Components of the Dataset

nrow(Voter_ID)

ncol(Voter_ID)

dim(Voter_ID)

colnames(Voter_ID)
```

# Tidy the Data

To begin the tidying process, we will rename the columns. When I downloaded the dataset as a ZIP file, it came with a PDF with a list of the columns and their respective labels. I will use that PDF as my guide as I tidy the data.

```{r}
summary(Voter_ID)
```

## Rename Variables

```{r}
Voter_ID <- Voter_ID %>%
  rename("Country Name" = cntry,
         "Country ID" = cntry_cd,
         "Voter ID Law Type" = id_type,
         "Number of Different IDs Allowed to Prove Identity" = num_id,
         "Does the electoral law provide an exhaustive list of different IDs voters can present?" = exhaust,
         "Minimum Number of IDs Required by Law" = min_id,
         "Year of Current Law Enforced" = law_yr,
         "Does the country have compulsory voting?" = cmp_vt,
         "Does the country enforce compulsory voting?" = cmp_enf,
         "Does the country have compulsory national ID cards?" = cmp_id,
         "Registration Type Law" = reg_law,
         "Continent" = continent)
Voter_ID
```

Now, notice that I did not rename every variable because some are mere repeats of other variables. Therefore, I will remove them because they're unnecessary to keep. The command I will use is 'select()', with another command '-c()' inside the 'select()' command, and type in the variables that I will remove.

## Remove Unneeded Variables

```{r}
Voter_ID <- Voter_ID %>%
  select(-c(id_type_lab, us_dum, reg_dev, reg_lab))
Voter_ID
```

Next, we will recode the values in the dataset so we won't be confused by what those numbers mean. We will go through certain variables that need recoding using the mutate () and recode () commands. Let's start off with continents. 

## Recode Continent

```{r}
Voter_ID <- Voter_ID %>%
  mutate(Continent=recode(Continent, 
                            `1` = "Africa",
                            `2` = "Americas",
                            `3` = "Asia",
                            `4` = "Europe",
                            `5` = "Oceania"))
Voter_ID
```

Next, we will recode the values in compulsory voting laws.

## Is Voting compulsory in your country? Does you country enforce compulsory?
```{r}
Voter_ID <- Voter_ID %>%
  mutate(Voter_ID, `Does the country have compulsory voting?` = recode(`Does the country have compulsory voting?`, `0` = "No", `1` = "Yes"))

Voter_ID <- Voter_ID %>%
  mutate(Voter_ID, `Does the country enforce compulsory voting?` = recode(`Does the country enforce compulsory voting?`, `0` = "No", `1`="Yes"))
  
Voter_ID
```

Here, we figure out what type of voting laws there are to prove one's identity. In this dataset, there are three categories: basic personal information, non-photo ID required or requested, and photo ID required.

## Recode Voter ID Type laws
```{r}
Voter_ID <- Voter_ID %>%
  mutate(Voter_ID, `Voter ID Law Type` 
         = recode(`Voter ID Law Type`,
                  `1` = "Basic Personal Details",
                  `2` = "Non-Photo ID Required or Requested",
                  `3` = "Photo ID Required"))

Voter_ID
```

Another question regarding voting ID is whether a country provides an exhaustive, or a really comprehensive, list of types of ID to prove one's identity. In other words, can you use just about anything to prove your identity in order to vote?

```{r}
Voter_ID <- Voter_ID %>%
  mutate(Voter_ID, `Does the electoral law provide an exhaustive list of different IDs voters can present?`
         = recode(`Does the electoral law provide an exhaustive list of different IDs voters can present?`,
                  `0` = "No",
                  `1` = "Yes"))

Voter_ID
```

Next, does the country have compulsory national ID laws? In countries where you must show some form of ID, some provide national ID cards that can be used when showing up at the booth.

```{r}
Voter_ID <- Voter_ID %>%
  mutate(Voter_ID, `Does the country have compulsory national ID cards?`
         = recode(`Does the country have compulsory national ID cards?`,
                  `0` = "No",
                  `1` = "Yes",
                  "#N/A" = "No Data"))

Voter_ID
```

Lastly, let's recode voter registration, which is in three categories:

1. Laissez-Faire: registering to vote is optional
2. Assisted: you can register to vote through utilizing public services
3. Automatic: you are automatically registered to vote once you've reached voting age

## Types of voter registration
```{r}
Voter_ID <- Voter_ID %>%
  mutate(Voter_ID, `Registration Type Law`
         = recode(`Registration Type Law`,
                  `1` = "Laissez-Faire Registration",
                  `2` = "Assisted Registration",
                  `3` = "Automatic Registration"))

# Sanity Check
Voter_ID
```

I will replace NAs with the label "no data"

```{r}
no_data <- c("No Data")
Voter_ID$`Voter ID Law Type`[is.na(Voter_ID$`Voter ID Law Type`)] <- no_data

Voter_ID$`Number of Different IDs Allowed to Prove Identity`[is.na(Voter_ID$`Number of Different IDs Allowed to Prove Identity`)] <-no_data

Voter_ID$`Does the electoral law provide an exhaustive list of different IDs voters can present?`[is.na(Voter_ID$`Does the electoral law provide an exhaustive list of different IDs voters can present?`)] <- no_data

Voter_ID$`Minimum Number of IDs Required by Law`[is.na(Voter_ID$`Minimum Number of IDs Required by Law`)] <- no_data

Voter_ID$`Year of Current Law Enforced`[is.na(Voter_ID$`Year of Current Law Enforced`)] <- no_data

Voter_ID$`Registration Type Law`[is.na(Voter_ID$`Registration Type Law`)] <-no_data

Voter_ID
```

```{r}
Voter_ID <- relocate(Voter_ID, "Continent", .after = "Country ID")
  
Voter_ID
```


Finally, we have finished tidying up the data! 

# Explain the Data

Mr. Tom Barton PhD, a Politics postgraduate student at the University of London, Royal Holloway, compiled voter ID laws by country, as well as US states. Voter ID laws, in this sense, refer to laws regarding requirements for voters to prove their identity in order to vote. Some countries and states don't require any form of ID asides from basic personal information, such as Massachusetts, Dominica, and New Zealand, while others are relatively restrictive with required photo IDs, such as Wisconsin, Texas, and Yemen. Most of the columns presented are thankfully self-explanatory.

In territories where IDs are required to vote, this dataset lists out those that offer more ways to show ID, hence the column "Number of Different IDs Allowed to Prove One's Identity". According to Barton, that was an open-ended question, as you can see that some answers are not rounded to the full amount. Also, in the same column, the territories that don't require IDs to vote are assigned to the value 0; in other words, a voter doesn't need to bring an ID to the polls. 

Lastly, there are three categories of voter registrations. Laissez-Faire Registration states that registering to vote is optional; Assisted Registration states that one can register to vote through various means, such as utilizing public services (i.e. Massachusetts); and Automatic Registration states that when the voter reaches voting age (mostly 18), they are automatically registered to vote.

# Questions to Ask Ourselves

What do you think is the best way to prevent voter fraud?

Do you think more voter ID laws would protect democracy and the integrity of elections?

Do you think more voter ID laws would stifle civic participation and harm democracy?

Do you think fewer voter ID laws would increase voter turnout?

Do you think fewer voter ID laws would increase the chances of voter fraud?

What do you think is the best way to increase voter turnout?

# Visualize the Data


```{r}
Voter_ID %>%
  count(`Registration Type Law`)

Voter_Reg <- data.frame(
  Voter.Registration.Law =factor(c("Lassiez Faire Registration","Automatic Registration","Assisted Registration")),  
  Countries.And.Territories=c(87,123,30)
  )
Voter_Reg

ggplot(data=Voter_Reg, aes(x= Voter.Registration.Law, y=Countries.And.Territories)) +
    geom_bar(stat="identity")
```

```{r}
Voter_ID %>%
  count(`Voter ID Law Type`)

Voter_ID_Type <- data.frame(
  Voter.ID.Type =factor(c("Basic Personal Details","Non-Photo ID Required or Requested","Photo ID Required")),  
  Countries.And.Territories=c(36,31,179)
  )
Voter_ID_Type

ggplot(data=Voter_ID_Type, aes(x= Voter.ID.Type, y=Countries.And.Territories)) +
    geom_bar(stat="identity")
```